Coding theory based models for protein translation initiation in prokaryotic organisms.
نویسندگان
چکیده
Our research explores the feasibility of using communication theory, error control (EC) coding theory specifically, for quantitatively modeling the protein translation initiation mechanism. The messenger RNA (mRNA) of Escherichia coli K-12 is modeled as a noisy (errored), encoded signal and the ribosome as a minimum Hamming distance decoder, where the 16S ribosomal RNA (rRNA) serves as a template for generating a set of valid codewords (the codebook). We tested the E. coli based coding models on 5' untranslated leader sequences of prokaryotic organisms of varying taxonomical relation to E. coli including: Salmonella typhimurium LT2, Bacillus subtilis, and Staphylococcus aureus Mu50. The model identified regions on the 5' untranslated leader where the minimum Hamming distance values of translated mRNA sub-sequences and non-translated genomic sequences differ the most. These regions correspond to the Shine-Dalgarno domain and the non-random domain. Applying the EC coding-based models to B. subtilis, and S. aureus Mu50 yielded results similar to those for E. coli K-12. Contrary to our expectations, the behavior of S. typhimurium LT2, the more taxonomically related to E. coli, resembled that of the non-translated sequence group.
منابع مشابه
Ribosome profiling: a Hi-Def monitor for protein synthesis at the genome-wide scale
Ribosome profiling or ribo-seq is a new technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome protected mRNA fragments allowing the measurement of ribosome density along all RNA molecules present in the cell. At the same time, the high resolution of this technique allows detailed analysis of ribosome density on indiv...
متن کاملGeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.
Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-...
متن کاملPredicting Translation Initiation Rates for Designing Synthetic Biology
In synthetic biology, precise control over protein expression is required in order to construct functional biological systems. A core principle of the synthetic biology approach is a model-guided design and based on the biological understanding of the process, models of prokaryotic protein production have been described. Translation initiation rate is a rate-limiting step in protein production ...
متن کاملA comparative genomic method for computational identification of prokaryotic translation initiation sites.
The ever growing number of completely sequenced prokaryotic genomes facilitates cross-species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework for comparative genomic analysis and demonstrates its utility in the context of improving the accuracy of prokaryotic gene start site detection. Our frame work employs a product hidden Markov model (PROD-H...
متن کاملPrediction Rate of Coding Regions is Enhanced upto 99.15 % by Joint Use of GeneMark-RC and GeneHacker in Case of a Cyanobacterium
The advancement in large-scale sequencing has accelerated the production of long contiguous nucleotide sequence data. The whole genomic sequence data is currently available for several prokaryotic organisms. The rst step in the analysis of genomic sequence data is to assign coding regions, which is absolutely necessary for a comparative study of one organism with the others and to elucidate com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bio Systems
دوره 76 1-3 شماره
صفحات -
تاریخ انتشار 2004